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PREFACE 


This  description  of  a  concept  for  measuring  the  productive  capacity  job  performance  of 
enlisted  personnd  is  part  of  an  on-going  Air  Force  research  program  to  develq)  the  technology 
necessary  to  base  selection,  classificaticHi,  and  personnel  management  policies  on  empirically- 
derived  job  performance  data.  The  effort  was  conducted  undor  Contract  F49650-92-R-S0Q5  by 
Systems  R^earch  and  Applications  Corporation  for  the  Manpower  and  Personnel  Research 
Division  of  the  Armstrong  Laboratory,  Human  Resources  Directorate  (AL/HR).  The  relevant 
work  unit  was  77192405,  "Improved  Methodology  for  Productive  Capadty  Measuremrat." 
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PRODUCTIVE  CAPAOTY:  THE  CONCEPT, 
RESEARCH.  AND  APPUCATIONS 


SUMMARY 

Over  the  last  decade  the  Air  Force  has  been  involved  in  research  aimed  at  idoitifying, 
devel(^ing,  and  evaluating  job  performance  measuremoit  technologies.  In  1989  the  Human 
Resources  Directorate  of  Armstrcmg  Laboratory  began  to  examine  time-based  measures  of  job 
proficiency,  leading  to  a  new  stream  of  research  collectivdy  known  as  productive  capacity  (PQ. 
This  report  reviews  the  results  of  previous  PC  efforts,  comments  on  the  theoretical  and  practical 
issues  that  result  from  this  work,  and  provides  recommendations  for  future  PC  research  efforts. 

The  aMicq>t  of  PC  in  rdatioa  to  oth^  criterion  concepts  and  related  areas  of  researdi 
is  also  discussed.  The  notions  of  maximal  and  typical  criteria,  technical  proficiency  versus 
cmitextual  aiteriim  domains,  and  models  of  job  performance  are  examined,  as  wdl  as  the 
literature  cm  time  perception/estimaticm  and  human  learning,  in  an  attempt  to  identify 
implications  and  lesscms  for  thinking  about  PC.  Suggestions  are  offered  for  incorporating  these 
notions  in  future  PC  research. 

The  final  section  of  this  rqmrt  addresses  potoitial  j^licaticms  of  the  PC  construct  in 
determining  manpower  requirements  and  setting  oilistmrat  standards.  We  review  current 
practices  and  possible  first  stq)s  for  implementing  PC-based  approaches. 


L  INTRODUCTION 

Personnel  researchers  have  Icmg  held  a  keoi  int^est  in  performance  measuremoit. 
Virtually  all  a^iects  of  human  resource  management  rely  cm  some  form  of  pmformance  measures 
to  validate  decision  tools,  such  as  performance  tests,  and  to  provide  the  basis  for  perscmnd 
decisions,  such  as  hiring,  firing,  or  promoting  an  individual.  Lmumetable  studies  have  dissected 
performance  measurement  issues  fiom  every  ccmceivable  angle  (e.g.,  psychometrically, 
ccmcq)tually,  legally,  practically,  econometrically)  in  a  wide  varidy  of  work  and  behavioral 
(xmtexts.  The  military  has  interest  in  performance  measurement  for  many  of  the  same  reasons 
as  business:  to  support  proper  personnel  decisions  and  enhance  the  ouq>ut  and  opdatimi  of  the 
organization.  P^uctive  capacity  (PQ  is  a  relatively  new  rqrproach  to  pmformance 
measurement.  We  will  briefly  review  other  attempts  to  measure  performance  in  the  military  to 
sd  the  stage  for  our  discussion  of  PC  research. 

A  major  stimulus  for  performance  measurement  research  occurred  in  the  late  VOs  whoi 
the  Armed  Services  Vocational  Aptitude  Battery  (ASVAB)  was  miscalibrated  to  its  normative 
base.  Many  people  were  accept^  for  enlistment  with  lower  mental  sq>titude  dian  as  tested 
(Office  of  the  Assistant  Secretary  of  Defense  -  Manpower,  Reserve  Affairs,  and  Logistics, 
1980),  focusing  attention  cm  the  possiUe  rq>etcussion5  on  individual  performance  and  overall 
misaon  equability  of  the  Sdvices.  Because  there  wde  few  methods  to  measure  individual 
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performance  on  a  wide  scale,  there  were  no  immediate,  reliable  answers  to  the  questions  about 
performance  and  capability.  In  1980  the  Assistant  Secretary  of  Defense  for  Manpower,  Reserve 
Affairs,  and  Logistics  directed  research  be  performed  to  measure  military  job  performance  and 
link  enlistment  standards  directly  to  performance.  Each  Service  shared  in  the  primary  research 
as  well  as  investing  in  specific  research  issues.  The  Air  Force  developed  a  Job  Performance 
Measurem^t  System  (JPMS)  consisting  of  hands-on  and  interview  work  sample  tests,  job 
knowledge  tests,  and  four  types  of  rating  forms.  Over  a  S-year  period,  instruments  were 
developed  and  data  were  collected  for  eight  enlisted  Air  Force  specialties  (AFSs)  (Hedge  & 
Teachout,  1986;  Laue,  Hedge,  Wall,  Pedersen,  &  Bentley,  1992). 

The  work  sample  tests,  collectively  known  as  the  Walk-Through  Performance  Test 
(WTPT),  were  composed  of  tasks  representative  of  the  job  performance  domain  (20  to  30  tasks 
for  each  APS).  Both  hands-on  and  interview  formats  required  the  examinee  to  accomplish  the 
tasks  at  the  work  setting  under  the  observation  of  a  train^  test  administrator,  who  scored  each 
step  in  a  task  as  correctly  or  incorrectly  performed. 

Four  rating  forms  (task,  dim^sional.  Air  Force-wide,  and  global)  were  developed  to 
measure  job  performance  from  the  very  specific  to  the  very  general  for  all  eight  AFSs. 
Supervisors,  peers,  and  job  incumbents  made  performance  ratings  on  a  S-point,  adjectivally- 
anchored  scale.  In  addition,  paper-and-pencil  tests  of  procedural  job  knowledge  were  develop^ 
and  administered  to  job  incumbents  in  four  AFSs. 

Correlations  between  the  hands-on  and  interview  work  sample  tests  were  moderate  to 
high,  ranging  from  .46  to  .84  (median  r  =  .68)  across  the  eight  AFSs.  The  correlations 
between  hands-on  tests  and  knowledge  tests  ranged  between  .30  and  .56  (median  r  =  .46). 
Finally,  across  self,  supervisor,  and  peer  ratings  of  technical  performance  (collapsing  across  the 
four  types  of  rating  forms),  correlations  with  the  hands-rni  work  sample  tests  ranged  from  .22 
to  .31,  while  ratings  of  interpersonal  effectiveness  (also  collapsing  across  ratings  forms) 
correlated  with  the  hands-on  work  sample  tests  from  .03  to  .14  (Hedge  &  Teachout,  1992). 

Aptitude  and  experirace  relation^ps  with  job  performance  were  also  examined.  The 
median  correlation  across  the  eight  AFSs  betwera  the  Armed  Forces  (^ualitication  Test  (AFQT) 
score  (an  ASVAB  composite  score  used  by  all  Sovices  as  an  indicator  of  general  trainability) 
and  the  hands-on  performance  was  .23  (range  =  .07  to  .32).  Corrdations  between  total  active 
federal  military  service  (a  measure  of  experience)  and  hands-on  performance  ranged  between  .  17 
and  .38  (median  r  =  .33). 

While  this  program  of  research  demonstrated  the  feasibility  of  measuring  job  performance 
in  a  military  context,  the  Air  Force  continued  to  examine  new  ^proaches  to  job  performance 
measurement.  The  emphasis  in  the  JPMS  had  been  on  the  measurement  of  the  quality  of  an 
individual’s  job  performance.  Raters  were  asked  to  judge  whether  incumbents  "never  meets”, 
"occasirmally  meets”,  "meets”,  "frequoitly  exceeds”,  ot  "always  exceeds”  required  standards 
for  quality  of  work. 
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In  1989  the  Air  Force  began  to  examine  time-based  measures  of  job  performance 
(Carpenter,  Monaco,  O’Mara,  &  Teachout,  1989).  Researchers  recognized  that  the  JPMS 
quality-based  metric  was  not  ideally  suited  for  «ilistment  standards  setting  and  that  an  emphasis 
on  in^vidual  differences  in  the  quantity  of  work  performed  might  prove  more  useful.  This 
decision  led  to  initiation  of  a  new  stream  of  research  collectively  known  as  productive  capacity, 
with  the  expectation  that  managers  might  be  able  to  define  minimal  performance  levels  more 
precisely  using  a  quantity  of  performance  scale  that  reflects  worker  productivity.  Furthermore, 
measures  that  quantify  individual  differences  in  amount  of  work  performed  might  be  useful  for 
manpower  plaiming  purposes.  Thus,  PC  could  facilitate  the  forecasting  of  ^titude  and 
experience  levels  needed  to  accomplish  certain  quantities  of  work. 

This  rqwrt  examines  the  PC  construct  and  reviews  the  background  of  PC  research,  the 
purposes  for  PC  measurement,  and  the  conceptual  sq>proaches  that  drive  measuiemrat 
considerations.  A  final  section  addresses  potential  applications  of  the  PC  construct  in 
determining  manpower  requirements  and  setting  enlistment  standards.  We  review  current 
standard  setting  practices  and  propose  possible  first  stq>s  for  implementing  PC-based  approaches. 


n.  REVIEW  OF  PREVIOUS  PRODUCTIVE  CAPACITY  STUDIES 

This  section  of  the  report  reviews  the  PC  research  program  to  date.  In  particular,  we 
review  and  commoit  on  reports  by  Carpenter,  Monaco,  O’Mara,  and  Teachout  (1989);  Faneuff, 
Valentine,  Stone,  Curry,  and  Hhgeman  (1990);  Skinner,  Faneuff,  and  Demetriades  (1991); 
Leighton,  Kageff,  Mosher,  Gribbai,  Faneuff,  Demetriades,  and  Skinner  (1992);  Demetriades 
and  Skmner  (1992);  Harville  and  Sldimer  (1993);  and  Faneuff  (1993). 

The  technical  paper  of  Carpenter  et  al.  (1989)  reported  the  first  attempt  to  create  and 
investigate  a  productive  opacity  index.  The  approach  to  measuring  PC  in  this  early  study  was 
quite  different  from  strategies  used  subsequently  in  the  research  program.  Also  not^le  was  the 
attempt  to  link  their  PC-related  time-to-proficiency  (ITP)  construct  with  several  of  the  quality- 
based  JPMS  project  performance  measures. 

Carpenter  et  al.  defined  PC  for  a  job  incumbent  as  the  ratio  of  estimated  fastest  possible 
performance  time  (in  minutes)  for  a  cluster  of  tasks  to  observed  performance  time.  They 
expressed  the  ratio  as  PC  =  t*A,  whoe  t  is  the  observed  time  and  t*  is  one  minute  less  than  the 
fastest  observed  time  for  all  incumbents  observed  performing  a  particular  task  cluster.  Note  that 
the  values  of  PC  range  from  0  for  the  slowest  (least  productive)  worker  to  1  for  the  fastest  (most 
productive)  worker.  Conceivably,  this  ratio  scale  of  measurement  allows  workers’  productivities 
to  be  expressed  in  terms  of  a  proportion  of  their  capability  relative  to  maximum  capability.  That 
is,  a  worker  whose  PC  =  .5  would  be  C£q)able  of  producing  50%  of  the  maximum  productivity 
possible  (amount  of  work  per  unit  of  time). 

Carpenter  et  al.  gathered  data  from  first  term  incumbents  in  the  Avionics  Communication 
Specialist  AFS.  To  generate  the  TTP  scores,  they  first  obtained  SME  estimates  of  how  long  the 
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typical  or  average  airman  took  to  complete  tasks  in  each  of  10  task  clusters.  Then,  Carpenter 
et  al.  asked  each  supervisor  of  the  sample  participants  to  identify  for  each  of  10  task  clusteis  one 
of  their  subordinates  whom  they  believed  could  accomplish  those  tasks  closest  to  the  average  or 
"normal”  amount  of  time.  The  chosen  subordinate  thus  became  the  benchmark  performer,  and 
the  supervisor  was  asked  to  estimate  how  long  it  would  take  each  of  his/her  other  subordinates 
to  accomplish  the  same  amount  of  work  as  the  b^chmark  performer  could  do  in  one  hour.  This 
procedure  then  generated  a  time  for  each  airman  in  the  sample  for  that  task  cluster.  The  same 
s^roach  was  followed  for  each  of  the  other  task  clusters. 

These  TTP  indiceG  were  therefore  subjective  estimates  of  the  relative  time  (against  an 
average  performer)  that  an  airman  took  to  successfully  complete  tasks.  TTP  is  a  quantity-based 
rather  than  quality-based  index.  It  has  the  added  advantage  that  supervisors  can  probably 
perform  the  rating  task  relatively  easily  because  the  comparisons  to  be  made  are  among  airmen 
whose  performance  they  know  well.  A  problem  with  the  index,  addressed  in  subsequent  PC 
work,  is  that  the  comparisons  are  not  made  against  the  same  "standard"  performer.  Average 
or  normal  performers  may  vary  considerably  across  work  groups  regarding  the  time  it  takes 
them  to  complete  tasks,  and  thus  the  comparison  airmra  (benchmark  performers)  provide 
inconsistent  stimuli  for  making  TTP  estimates  for  other  airmen. 

Nonetheless,  the  correlations  between  TTP  ratings  and  JPMS  performance  ratings  are 
instructive.  For  S8  of  the  airmen  in  the  total  sample,  supervisor,  peer,  and  self  ratings  on  the 
four  differ^t  JPMS  rating  forms  from  the  JPMS  data  collections  were  available.  Although  the 
authors  and  others  (e.g. ,  Faneuff  et  al. ,  1990)  were  not  very  positive  about  these  results,  we  see 
some  good  evidence  of  construct  validity  ifor  the  TTP  ratings.  In  particular,  correlations 
between  TTP  scores  and  ratings  on  the  more  technical  proficiaicy-oriented  performance 
dimensions  were  quite  substantial  and,  importantly,  considerably  higher  than  correlations 
betwe^  TTP  ratings  and  ratings  on  dimensions  less  conceptually  related  to  technical  proi^ciency 
and  productive  C2q)acity. 

For  example,  focusing  on  the  Global  dimension  ratings,  correlations  between  TIP  and 
Technical  Proficiency  were  -.52,  -.41,  and  -.50,  respectively,  for  supervisor,  peer,  and  self 
ratings  and  only  -.19,  -.17,  and  -.06  b^een  TTP  and  Interperscmal  Skills  for  ratings  from  the 
three  sources.  Similarly,  with  the  Air  Force-wide  scales,  correlations  between  TTP  ratings  and 
the  Technical  Knowledge/Skill  dimoisicHi  ratings  (the  most  clearly  technical  proficiracy-orirated 
dimension)  were  -.55,  -.66,  and  -.46,  respectively,  for  supervisor,  peer,  and  seif  ratings. 
Correlatimis  with  all  other  Air  Force-wide  dimoisions  were  lower.  Considering  that  these 
correlations  were  between  ratings  generated  independraitly  by  differrat  raters  and  separated  by 
time,  and  that  the  TTP  ratings  reflect  an  early  smempt  to  measure  a  PC  construct,  the  research 
results  provide  considerable  orcouragement  for  omtinuing  efforts  to  measure  PC. 

An  additional  analysis  in  the  report  examined  the  relationship  between  TTP  ratings  and: 
(1)  aptitude,  derived  from  ASVAB  scores;  and  (2)  experiaice,  defined  as  number  of  mmiths  in 
the  Air  Force.  Regressicm  of  TTP  scores  (mi  aptitude  and  experience  scores  resulted  in  an 
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of  .44.  The  significant  beta  weights,  respectively,  were  .017  and  .027,  indicating  that 
experience  level  was  the  more  important  influence  on  this  ind»  of  PC. 

The  Carpenter  et  al.  (1989)  predictor  modd,  including  t^titude,  productivity,  cost,  and 
attrition,  vns  unable  to  accommodate  more  than  one  AFS  simultaneously.  Therefore,  Faneuff 
et  al.  (1990)  used  a  different  methodology  to  measure  PC  and  addressed  its  use  in  setting 
ASVAB  cut  scores  when  several  AFSs  were  conddered  simultaneously.  Here  PC  was  defined 
as  the  ratio  of  t/t*.  Because  data  were  obtained  from  the  JPMS  subjects  with  no  direct  task 
performance  times  available,  t  was  defined  as  a  subject’s  total  WTPT  score  (sum  of  hands-on 
and  interview  scores)  and  t*  was  the  highest  obtained  total  WTPT  score  for  the  respective 
specialties  (see  the  earlier  discussion  of  the  JPMS  for  more  details  on  the  WTPT  score). 

Faneuff  et  al.,  showed  that  aptitude  (i.e.,*  ASVAB)  cut  scores  for  an  AFS  depeaded  on 
both  aptitude  standards  for  the  AFS  and  manpower  requirements  for  that  and  other  AFSs.  For 
example,  the  Carpenter  et  al.  estimate  of  the  optimal  minimum  AFQT  score  for  the  Avionics 
Communication  Specialist  AFS  was  90.  Faneuff  et  al.  argued  that  this  cutoff  would  be  lower 
if  other  AFS  requirements  and  the  general  aptitude  level  of  the  recruit  pool  w^  considered. 
Accordingly,  they  extended  the  Carpent^  et  al.  model  to  consider  minimum  ^titude  standards 
under  various  scenarios  incorporating  different  recruit  pool  conditions  and  manning  requiremoits 
for  other  AFSs. 

Faneuff  et  al.  recommended  mote  effort  toward  establishing  definitions  of  optimal 
performance  regarding  not  only  quantity  (e.g.,  the  t*)  but  also  quality  (what  they  termed  q*). 
This  raises  an  interesting  possibility.  The  concept  of  a  quantity-based  ratio  of  t*/t,  or  the 
minimum  time  to  complete  a  piece  of  work  divided  by  the  target  airman’s  time  to  complete  that 
work,  could  be  extended  to  quality-based  measures.  An  airman’s  performance  on  a  dimension 
q  could  be  compared  to  the  best  possible  performance  cm  that  dimrasicm  q*,  and  the  q*/q  ratio 
might  serve  as  the  quality  equivaloit  to  t*/t.  Of  course,  a  quality-based  ratio  has  not  bera  used 
to  date  because  it  is  difficult  to  argue  that  quality-based  ratings  can  be  considered  as  on  a  ratio 
scale. 


The  Military  Testing  Association  paper  by  Skinner  et  al.  (1991)  rq)orted  on  the  next 
major  stq)  toward  measuring  PC.  The  earliest  report  (Carprater  et  al.,  1989)  recognized  that 
obtaining  actual  task  completion  times  using  hands-on  tests  was  not  practical  for  any  large-scale 
implemoitation  of  a  PC  measurement  system.  That  was  the  assumption  here,  as  well,  fiistead, 
the  goal  was  to  get  reliable  and  valid  time-to-compleU  estimates  ^m  sup^itisors.  However, 
the  authors  also  recognized  that  the  comparison  rating  task  used  by  Carprater  et  al.,  although 
a  good  first  stq>  toward  PC  measuremoit,  was  c(mcq)tually  flawed  (as  discussed  previously  in 
this  section).  This  research  attempted  to  develop  a  rating  scale  that  allowed  raters  to  compare 
their  ratees’  time-to-complete  with  an  absolute  standard  for  all  raters. 

To  explore  the  feasibility  of  developing  a  rating  scale  with  absolute  standards,  Skinner 
et  al.  examined  a  boichmarking  idea  that  would  establish  actual  times  to  complete  a  task 
successfully  at  three  differmt  competency  levels  —  fastest  possible,  avoage  or  normal,  and 
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slowest  posable.  The  general  noticm  was  that  if  these  benchmarks  could  be  reliably  determined 
and  placed  as  anchors  on  a  rating  scale,  they  might  serve  as  useful  referaice  points  for  raters 
judging  the  amount  of  time  their  airmen  ratees  spend  successfully  completing  the  task. 

Accordingly,  research  proceeded  toward  developing  these  anchors  for  35-47  tasks  from 
each  of  four  AFSs.  Six  SMEs  from  each  AFS  participated  in  workshops  to  establish  estimates 
of  these  times-to-complete  for  each  task.  Specifically,  SMEs  first  made  indepoident  estimates 
of  performance  times  for  the  fastest  possible  and  the  typical  or  average  incumbent  and  for  an 
incumbent  who  was  the  slowest  possible  (but  not  so  slow  that  the  task  would  be  assigned  to 
someone  else).  Second,  the  SMEs  participated  in  a  modified  Nominal  Group  Technique  process 
that  yielded  consensus  times  for  each  of  the  three  levels.  Here  are  two  example  tasks  with  their 
consensus  times: 


Fastest  Normal  Slowest 

Replace  radio  frequency  coaxial  connector  12  mins.  17  mins.  22  mins. 

Perform  in-processing  of  unit  personnel  5  mins.  7  mins.  IS  mins. 


The  main  finding  was  that  the  initial  time  estimates  were  reasonably  reliable.  One-rater 
reliabilities  ranged  from  .38  to  .81  for  individual  tasks.  Also,  the  fastest,  normal,  and  slowest 
time  estimates  had  about  the  same  level  o'  intenater  reliability.  Stepped-up  6-rater  reliabilities, 
2q)propriate  for  assessing  the  reliability  of  the  mean  judgments  across  the  six  SMEs,  were  in 
the  and  .90s. 

A  valuable  additional  boieflt  of  this  research  is  provided  by  compariscms  of  the  percent 
time  increases  for  normal  to  fastest  and  for  slowest  to  normal  in  different  tasks.  As  shown  in 
the  examples  above,  for  the  in-processing  task  the  fastest  airman  cannot  do  much  better  than  the 
normal  airman  but  the  normal  airman  is  likely  more  than  twice  as  fast  as  the  slowest  airman. 
Conversely,  the  variance  across  the  three  levds  of  the  oth^  task  is  quite  tmiform.  In  general, 
the  ratio  of  the  slowest/fastest  times  and  the  time  differences  between  adjacent  anchors  provide 
potratially  useful  information  about  tasks.  For  example,  we  might  predict  that  more  difficult, 
complex  tasks  would  have  larger  slowest/fastest  time  ratios  than  simpler,  more  routine  tasks. 
This  ratio  could  in  fact  provide  a  numerical  index  of  task  complexity. 

At  any  rate,  for  the  four  AFSs  studied,  the  slowest  to  normal  percrat  decrease  in  time 
varied  widely  across  tasks  (6-80%),  as  did  the  normal  to  ^test  percent  decrease  (10-67%). 
Interestingly  though,  the  average  decrease  for  the  AFSs  were  very  similar  and  close  to  the  same 
for  the  two  comparistms  (i.e.,  slowest-normal  and  normal-fastest).  Those  pocratages  were  39, 
37,  37,  and  39  for  normal  to  fastest  and  32,  31,  32,  and  39  for  slowest  to  normal. 

The  coitral  contribution  of  this  research,  however,  was  a  demonstration  that  performance 
time  anchors  for  these  three  competency  levels  could  be  reliably  estimated  by  SMEs.  The  next 
questions  involve  actually  using  these  scales  to  help  estimate  t  values  for  airmen  ratees.  Can 
raters  (e.g.,  supervisors  or  peers)  use  these  scales  to  provide  reliable,  accurate,  and  valid  time- 
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to-complete  estimates  for  ratee  task  performance?  Subsequent  research  has  addressed  these 
questions. 

The  Leighton  et  al.  (1992)  iqwrt  described  data  collection  for  the  next  logical  stq)  in  the 
research  program  (i.e.,  are  supervisor  raters  able  to  make  accurate  assessmrats  of  task 
performance  time  [t]  for  individual  airmen?).  Three  hundred-twraty  supervisors  in  four  AFSs 
estimated  time-to-complete  (i.e.,  performance  time)  for  680  subordinates  cm  multiple  tasks  (36- 
50  per  AFS).  For  a  subset  of  the  tasks  (6-11  per  AFS),  240  of  the  680  subordinates 
actually  timed  doing  hands-on  tests  associated  with  those  tasks.  Their  hands-on  performance  was 
also  rated  (m  a  S-point  scale  from  1  =unacceptable  to  S =exceptional.  The  3-point  was  labeled 
acceptable  and  could  be  used  as  a  cutoff  below  which  times  would  not  be  counted.  This  is 
useful  for  measuring  PC  because  the  definition  of  t  includes  the  notion  of  successful  task 
completion. 

Leighum  et  al.  collected  other  potentially  useful  data  for  studying  PC.  Supervisors 
providing  the  time  estimates  indicated  for  each  task  how  often  they  had  observed  the  subordinate 
on  the  task  (r^ularly,  occasionally,  or  never).  A  fEtir  test  of  the  potential  validity  of  the 
performance  time  estimates  might  use  only  data  provided  by  supervisors  who  regularly  observe 
the  subordinate  completing  the  task.  Also,  the  airmra  being  tested  indicated  for  each  task 
whether  they  regularly,  occasionally,  or  never  performed  the  task.  Again,  subsequent  tests  of 
the  validity  of  performance  time  estimates  might  consider  only  tasks  that  the  subject  airman 
regularly  performs. 

Finally,  these  researchers  administered  to  members  of  the  subordinate  ratee  sample,  for 
three  of  the  four  AFSs,  a  job  knowledge  test,  a  vocational  interest  invratory,  and  a  motivation 
scale.  In  addition,  they  gathered  a  supervisor  rating  of  overall  productivity  for  each  ratee.  This 
required  the  supervisor  rater  to  consider  the  maximum  amount  of  accepts^le  work  the  subordi¬ 
nate  can  produce  in  a  day  and  then  indicate  the  percoitage  of  that  amount  he/she  could  typically 
be  expected  to  do. 

This  data  collection  initiative  is  a  major  contribution.  First,  the  time  estimation  rating 
comes  from  supervisors  using  the  improved  brachmark  performance  time  scale,  with  the  SME- 
generated  fastest,  normal,  and  slowest  times  to  perform  each  task  included  as  anchors.  Second, 
the  additional  measures  taken  (e.g.,  job  knowl^ge)  can  support  correlational  analyses  with  the 
performance  time  ratings  to  provide  a  clearer  picture  of  the  PC  construct.  Third,  data  cm  the 
frequracy  of  observation  cm  the  part  of  raters  and  frequency  of  performing  the  task  cm  the  part 
of  ratees  might  prove  very  helpful  for  testing  the  validity  of  performance  time  estimates  imder 
relatively  ideal  conditions.  Finally,  the  sample  sizes  are  reasonably  large  and  subordinate  ratees 
are  from  more  than  erne  AFS,  rendering  any  results  from  this  data  set  relatively  goieralizable. 

The  Demetriades  and  Skinner  (1992)  APA  paper  cov^ed  some  of  the  earlier  PC  results. 
Demetriades  and  Skinner  diowed  the  interrat^  rdiabilities  for  the  SME  benchmarking  study. 
For  the  four  AFSs  studied,  the  authors  presented  mean  1 -rater  and  6-rater  (the  number  of  SMEs 
used  in  the  brachmarking  effort)  reliabilities  averaged  across  all  35-47  tasks,  separately  for  the 
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fastest,  nonnal,  and  slowest  benchmark  estimates.  One-rater  reliabilities  ranged  from  .48  for 
the  fastest  estimates  <ni  the  Communicadtm  and  Navigation  Systems  AFS  tasks  to  .80  for  the 
fastest  estimates  on  the  Personnel  AFS  tasks.  Corresponding  6-rater  reliabilities  ranged  from  .85 
to  .96. 


In  addititm,  Dunetriades  and  Skinny  denumstrated  CMie  type  of  validity  for  the  normal 
benchmark  estimates.  A  total  of  54  airmen  were  actually  timed  performing  each  of  44  tasks 
(about  11  per  AFS  of  the  35-47  tasks  in  each  AFS),  and  Ae  mean  times  taken  on  each  task  by 
these  airmen  were  corrdated  with  the  nonnal  benchmark  times  for  the  same  44  tasks.  The 
correlatimi  was  quite  high  (r=.75,  p  <  .0001),  indicating  that  the  normal  benchmark  times  tm 
the  rating  scales  mirror  rather  closely  the  avmge  time  airmen  actually  take  to  complete  these 
tasks.  It  shmild  be  made  clear  (as  the  authors  themselves  do)  that  this  finding  does  not  pertain 
to  the  validity  of  performance  time  latings  of  airmen  but  to  the  validity  or  perh2q)s  realism  of 
the  normal  b^chmark  times. 

Demetriades  and  Skinner  also  rqwrted  on  the  correspondence  between  the  mean  time 
taken  on  tasks  by  the  airmen  and  the  normal  benchmark  times,  assessed  in  terms  of  absolute 
differences  between  the  two.  The  grand  means  for  the  actual  times  and  benchmark  estimate 
times  were  quite  similar  (9.20  and  11.68  minutes,  respectively,  t=1.32,  not  significant). 
However,  Demetriades  and  Skinner  pointed  out  that  the  distributions  of  scores  across  the  44 
tasks  were  very  different,  with  the  normal  benchmark  estimates  having  much  greater  variance. 

This  study’s  findings  suggest  that  the  nonnal  benchmarks  for  the  performance  time  rating 
scales  are  probably  realistic,  at  least  in  terms  of  the  benchmark  times  for  tasks  relative  to  the 
boichmark  times  for  other  tasks.  Again,  this  evidence  says  nothing  about  validity  related  to 
performance  time  ratings  of  airmen  made  on  the  benchmarked  scales,  but  it  is  reassuring  that 
the  benchmarks  supervisor  raters  will  use  to  make  those  performance  time  ratings  do  not  appear 
to  be  seriously  distorted. 

Harville  and  Skinner  (1993)  was  the  first  rqwrt  of  performance  time  rating  validities 
using  individual  ratees  as  the  level  of  analysis.  Harville  and  Skinner  evaluated  the  validity  and 
accuracy  of  supervisory  performance  time  ratings.  They  defined  validity  as  the  corrdation 
between  supervisor  performance  time  estimates  for  individual  ratees  mi  a  task  and  the  actual  tinm 
each  ratee  took  to  successfully  complete  that  task.  Accuracy  was  defined  as  the  grand  means 
of  the  performance  time  ratings  across  all  ratees  and  tasks  for  an  AFS  compared  to  the  grand 
mean  time  those  ratees  actually  spend  successfully  completing  the  same  ta^. 

For  the  analyses,  the  authors  obtained  actual  time  estimates  for  a  moderate  sample  of 
airmen  in  each  of  four  AFSs  (up  to  61  airmen)  rni  6  to  1 1  tasks  for  each  AFS.  The  airmen  who 
were  administered  these  hands-cm  tests  were  also  rated  by  their  supovisors  on  the  benchmarked 
performance  time  estimate  scale  for  each  task.  Results  are  dqiicted  in  Table  1. 

Validities  for  three  of  the  four  AFSs  vtwie  moderately  positive.  With  more  attention  paid 
to  evaluating  the  validity  of  performance  time  ratings  on  tasks  often  observed  by  supervisors  and 
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regularly  performed  by  ratees,  these  validities  might  improve  considerably.  We  will  discuss  this 
topic  further  in  a  subsequent  section.  The  accuracy  results  wm  promising  for  two  of  the  AFSs. 
The  results  for  Avionics  Communication  would  have  been  more  promising,  as  wdl,  had  (me  of 
the  seven  tasks  beat  removed  (one  of  the  task  times  was  serioudy  overestimated). 


Table  1.  Performamre  nme  Rating  Validities 


AFS 

No.  of 
Tasks 

Validity 

Correlations 

Accuracy  Results 

Personnd 

7 

.28,  p<.03 

1.44  mins,  more  for 
estimated 

Avionics  Communication 

7 

.29,p<.03 

S.Ol  mins,  more  for 
estimated 

Aerospace  Group  Equipment 

6 

.23,  p<.08 

6.0  mins,  more  for 
estimated 

Aircrew  Life  Support 

11 

-.01,  ns. 

0.2  mins,  less  for 
estimated 

Harville  and  Skinner  also  investigated  the  joint  effects  of  experience  and  ability  on  the 
time-to-complete  estimates  for  six  of  the  tasks  from  each  of  the  four  AFSs.  Regression  of  the 
time-tcxomplete  estimate  on  experience  and  ability  resulted  in  Rs  from  .04  to  .31  across  the  24 
tasks,  with  a  mean  of .  19.  The  authors  indicated  that,  generally,  experience  contributed  more 
than  ability  to  this  predicticm.  It  would  be  enlightoiing  to  reestimate  the  model  with  actual  times 
rather  than  estimated  times  as  the  depradent  variable,  although  the  sample  sizes  would  be  much 
smaller. 

Faneuff  (1993)  used  data  gathered  by  Ldghton  et  al.  (1992)  on  the  Aero^mce  Ground 
Equipment  AFS,  including  performance  time  rating  data  for  50  tasks,  ^titude  scores  from  the 
Mechanical  composite  of  Ae  ASVAB,  and  experience  information  (i.e.,  months  in  the  Air 
Force)  (m  204  airmoi  with  erne  to  six  years  in  the  Air  Force.  He  used  the  t*/t  formulation, 
which  produces  an  index  that  can  vary  from  0  to  1.0,  to  compute  PC  scores  for  individual 
airmen  on  each  task.  He  defined  t*  for  each  task  as  the  fastest  performance  time  estimate  for 
an  airman  in  the  sample  for  that  task.  The  main  thrust  of  his  analysis  was  to  r^ress  these  PC 
saxes  on  ability  and  experience.  Faneuff  was  also  concerned  about  the  problem  of  how  to 
weight  PC  scores  on  different  tasks  to  create  an  ovmll  PC  score  for  each  airman.  He  argued 
that  the  m<»t  reasonable  approach  here  was  to  use  avoage  time  spent  data  for  individual  tasks 
to  derive  these  weights.  After  several  rescaling  adjustments  for  some  outlier  PC  scores,  Faneuff 
conducted  the  regressions  using,  essentially,  a  logistic  transformation  of  the  rescaled  PC  scores. 

Resulting  R^s  from  the  regression  analyses  for  each  of  the  50  tasks  ranged  from  .01 
to  .13.  Aptitude  beta-weights  were  significantly  different  from  zero  for  only  two  of  the  50 
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tasks;  for  experience,  33  of  SO  tasks  showed  significant  beta-weights.  In  only  four  of  the  50 
tasks  did  a  significant  aptitude  by  experioice  interaction  emerge.  Whoi  PC  data  woe 
aggr^ated  across  the  SO  tasks,  employing  the  average  task  time  spent  weighting  scheme,  the 
resulting  was  .16.  Experience  again  had  a  substantially  greater  impact  than  aptitude  in 
prediction  of  PC.  These  results  confirmed  the  Carpenter  et  al.  (1989)  findings  rdated  to  the 
relative  weights  of  s^titude  and  experience  in  determining  PC.  It  is  noteworthy  that  the  .  16 
value  was  so  much  smaller  than  the  .44  R^  derived  in  Carpenter  et  al.  Possible  explanatitnis  are 
that  the  jobs  studied  were  different,  the  PC  measures  were  not  the  same,  and  the  smaller  N  in 
the  Carpenter  et  al.  analysis  may  have  led  to  more  of  an  overestimate  of  R^  than  in  the  Faneuff 
study. 


Across  the  studies  reviewed  above,  the  PC  research  program  has  several  features  that 
demrmstrate  potential  for  contributing  to  a  practical,  useful  way  of  dq)icting  job  incumboit  and 
organizati(Mial  productivity.  The  idea  of  the  t*/t  ratio  for  indexing  individual  incumboit 
performance  on  a  task,  while  simple,  is  a  novel  way  to  evaluate  performance.  That  the  t*/t 
index  is  useful  for  thinking  about  individual  performance  in  a  responsive,  mission-related  way 
and  that  this  index  can  appropriately  address  the  productivity  of  units  are  positive  features  of  the 
strategy.  The  concq>tualization  of  productivity  in  this  manner  seems  both  compiling  as  a  new 
approach  in  performance  measurement  and  useful  for  certain  important  applications. 


m.  REFLECTIONS  ON  PRODUCTIVE  CAPACITY 

In  this  section,  we  attempt  to  provide  some  perspective  on  the  concept  of  PC  by 
discussing  it  in  the  context  of  other  criterion  concepts  and  related  areas  f  research.  ThenotirMos 
of  maximal  and  typical  criteria  (Cronbach,  1960;  Saclmtt,  Zedeck,  &  Fogli,  1988),  technical 
proficiency  versus  contextual  criterion  domains  (Borman  &  Motowidlo,  1993),  and  models  of 
job  performance  (e.g.,  Borman,  1991;  Campbell,  McCloy,  Oppler,  &  Sag^,  1993)  are  reviewed 
briefly  as  they  relate  to  ^  PC  concq>t.  Also,  we  dik^  literature  on  time  percq>tion  and 
estimation,  as  well  as  on  human  learning,  in  an  attempt  to  identify  implications  for  improving 
measurement  of  individual  PC. 

Typical  Versus  Maxinaal  P^ormance 

Many  years  ago  Cronbach  (1960)  made  die  useful  distinction  between  maximal  and 
typical  performance.  He  referred  to  maximal,  ”can-do”  performance  as  ability-related  and 
typical,  "will-do”  performance  as  driven  more  by  motivational  factors  than  by  ability. 

Campbell’s  (1990)  model  of  the  determinants  of  job  performance  clarifies  the  distincticxi. 
The  model  implies  that  performance  is  a  fimction  of  declarative  or  factual  knowledge,  procedural 
knowledge  (i.e.,  knowing  how  to  do  a  task),  and  motivatirm.  Maximal  performance  involves 
the  first  two  compraents,  related  to  job  knowledge,  with  motivation  essentially  held  constant. 
This  is  because  maximal  performance  measures  such  as  work  samples  and  hands-on  performance 
tests  usually  omstrain  workers  to  try  hard  for  the  short  duration  of  the  test.  Typical 
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performance,  on  the  other  hand,  "depends  substantially  on  motivation.  Will-do,  performance- 
over-time  requires  job  knowledge  certainly,  but  also  requires  sustained,  motivat^  effort  in  a 
setting  where  motivation  is  not  constrained  and  can  clearly  vary  across  job  incumboits. 

How  does  ability  fit  in  here?  Performance  models  offered  and  ccmfirmed  by  Hunter 
(1983),  Schmidt,  Hunter,  and  Outerbridge  (1986),  and  Borman,  White,  Pulakos,  and  Qppler 
(1991)  demonstrate  a  clear  path  from  ability  (i.e.,  g«ieral  cognitive  ability)  to  job  knowledge 
to  technical  proficiency,  where  the  profidoicy  variable  is  a  maximum  p^ormance  measure. 
Figure  1  is  a  portion  of  the  Borman  et  al.  path  model  derived  from  data  on  more  than  4,300  first 
tour  soldiers  in  nine  U.S.  Army  jobs. 


FigiiRl.  Fiirtial  Path  Model 

Thus,  ability  sqipears  to  influence  the  acquisition  of  job  knowledge,  which  in  turn  impacts 
upcm  maximal  performance. 

Interestingly,  when  supervisny  perfcmnance  ratings  are  included  in  the  above  modd, 
alrnig  with  selected  perscMudity  individual  differences  variables  and  behavioral  variables 
reflecting  mostly  Qrpical  perfwnunce,  the  model  in  Figure  2  emerges  (Borman  et  al.,  1991).The 
flt  of  this  modd  is  very  good,  with  the  adjusted  goodness  of  fit  index  equal  to  .976  and  a  root 
mean  square  residual  of  .039. 

way  to  interpret  these  results,  within  the  framework  of  maximal/^cal  performance 
and  thdr  antecedents,  is  that  the  supervisee  ratings  are  likdy  measures  of  both  maximal  and 
typical  performance.  Although  the  ratings  are  meant  to  hq)  typical  perfcemance,  ratm  may 
consider  can-do  perfcemance  whoi  rating  cm  sudi  dimendons  as  Technical  Knowledge  and  Skill. 
Accordingly,  maximal  performance  (Technical  Proficiency)  sqq>ears  to  be  a  functiem  of  ability 
and  job  knowledge,  but  not  a  funetkm  of  perscmality.  Typical  performance,  as  c^>tured  in  tte 
ratings,  has  as  antecedents  both  the  ability-*job  knowledge-*technical  proficiency  sequoice  of 
variables  and  personality  through  the  typi^  perfcmnance  behavioral  variables.  It  may  be  that 
the  technical  proficiency  ratings  path  would  not  be  as  large  if  the  ratings  reflected  only  typical 
performance.  Although  the  interpretatum  of  the  results  is  dearly  ^)eculative,  the  perfcxmance 
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f1giira2.  FuUFMhModd 

modd  suggests  that  maximal  and  tyjncal  performance  can  be  distinguished  by  their  scnnewhat 
different  antecedents.  Maximal  and  typical  perfcnmance,  both  very  important  for  organizaticmal 
functicming,  may  not  be  very  highly  condated. 

A  study  by  Sackett,  Zedeck,  and  Fogli  (1988)  provides  a  more  direct  pictuie  of  the 
rdatkmship  of  maximal  and  typical  performance.  Working  with  grocery  derks,  Sackdt  et  al. 

a  maTimal  performance  work  sanqde  test  foat  consisted  of  ringing  19  items  in  a 
standardized  shofiiwig  cart  Ideasures  ci  speed  (total  time  to  conq>lete  the  cart)  and  accuracy 
(number  (Avoids  and  incxmect  entries)  were  derived  from  fois  test  F<v  foe  typical  perfrxrmance 
measure,  a  am^Miter-monitored  system  kept  track  of  each  grocery  cleric’s  performance  over  a 
30-day  period.  Speed  (number  of  items  rung  per  minute)  and  accuracy  (number  cf  voids  per 
day)  were  measured.  Sackett  dal.  did  not  differentiate  clerks  based  cm  of  register  (scanner 

ce  electronic)  used;  a  later  study  by  DuBds,  Sackett,  Zedeck,  &  Fogli  (1993)  did  so  a;^  those 
data  are  shown  in  Table  2,  for  more  5S0  grocery  deto  on  a  scanner  type  roister  and  for  mme 
than  135  clerks  on  an  electronic  type  r^isto. 

The  cmrrelatioas  between  foe  tyiacal  and  maximum  measures  of  ^wed  were  .32  Op  < 
.01)  on  foe  scanner  type  roister  and  .11  on  foe  electronic  roister,  and  between  typical  and 
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Table  2.  Speed  and  Accuracy  Correlations 


TS 

MS 

TA  MA 

Scann^  register 

Typical  Speed 

. — 

Maximal  Speed 

.32** 

. — 

Typical  Accuracy 

-.04 

.02 

. — 

Maximal  Accuracy 

.06 

.14* 

.12**  .— 

Electronic  register 

Typical  Speed 

. — 

Maximal  Speed 

.11 

* 

Typical  Accuracy 

.02 

.24* 

. — 

Maximal  Accuracy 

.20* 

.09 

.32**  .— 

Note.  TS  =  typical  speed;  MS  =  maximal  ^peed;  TA  =  typical  accuracy;  MA  *= 
maximal  accuracy. 

•p<.05,  ••p<.01 


maximum  measures  of  accuracy  .12  (p  <  .01)  on  the  scanner  legists  and  .32  (p  <  .01) 
on  the  electronic  register.  Thus,  three  of  the  four  comparisons  for  linking  maximal  and  typical 
performance  in  the  two  samples  of  clerks  yielded  significant  results,  but  the  most  striking  ^ding 
here  is  the  goierally  low  corrdations  be^een  die  two  componrats  of  performance. 

In  addidcm,  DuBois  et  al.  (1993)  demonstrated  a  somewhat  different  pattern  of 
antecedents  or  predictors  of  maximal  and  typical  performance.  In  particular,  for  the  seamier 
sample,  a  test  of  percqitual  ability  predicted  maximal  performance  (speed)  significantly  better 
than  it  predicted  typical  performance  related  to  speed.  Similarly,  the  numerical  ability-maximal 
performance  (spe^)  correlation  was  significantly  higher  tbm  the  numerical  ability-^ical 
p^ormance  correlation  for  speed  in  the  dectronic  sample.  Again,  a  more  substantial  link  exists 
betweoi  ability-rdated  predictors  and  maximal  p^ormance  compared  to  ability-typical 
performance  relations. 

So,  what  does  all  of  this  have  to  do  with  PC?  We  see  two  important  implicadmis  of  the 
above  work  for  the  PC  research  program.  First,  PC  is  almost  certainly  a  maximal  performance 


concept.  The  intent  is  to  measure  capacity  for  performance  rath^  than  typical  levels  of  perfor- 
mance-over-time.  However,  it  should  be  recognized  that  the  PC  construct  may  be  deficit  as 
a  criterion  in  the  sense  that  it  does  not  address  the  typical  performance  domain.  This  point 
would  be  moot  if  the  literature  indicated  high  correlations  between  maadmal  and  typical 
performance,  but,  as  we  have  seen,  relationships  between  these  constructs  are  not  very  strong 
(Sackettet  al.,  1988). 

Second,  ability  should  be  the  most  successful  individual  differoices  predictor  of  maximal 
performance.  As  mrationed,  DuBois  et  al.  (1993)  found  that  thdr  ability  measures  correlated 
more  highly  with  maximal  performance  than  with  typical  performance.  Accordingly,  it  will  be 
important  to  continue  including  ASVAB  variables  in  the  research  on  PC.  General  cognitive 
ability  and  related  abilities  and  aptitudes  should  predict  PC. 

Technical  Proficiency  Versus  Contextual  Criterion  Domains 

Another  distinction  related  to  performance  constructs  in  the  criterion  space  is  between 
task  performance  and  contextual  performance.  Borman  and  Motowidlo  (1993)  recently  discussed 
this  distinction  and  its  implications  for  personnel  selection  programs.  Task  performance  relates 
to  the  proficiency  with  which  job  incumboits  p^orm  activities  that  are  formally  recognized  as 
part  of  their  jobs.  They  defined  contextual  performance  as  going  beyond  the  activities  that 
comprise  "the  job"  to  help  accomplish  organizational  goals.  Integrating  elements  of 
organizational  citizenship  (e.g.,  Organ,  1988),  prosocial  organizational  behavior  (e.g.,  Brief  & 
Motowidlo,  1986),  and  a  model  of  soldier  effectiveness  (Borman,  Motowidlo,  &  Hanser,  1983), 
Borman  and  Motowidlo  identified  five  contextual  dimensions.  They  are  listed  below. 

1.  Persisting  with  mthusiasm  and  extra  effort  as  necessary  to  complete  own  task 
activities  successfully 

2.  Volimteering  to  carry  out  task  activities  that  are  not  formally  part  of  own  job 

3.  Helping  and  coop^ating  with  others 

4.  Following  organizational  rules  and  procedures 

5.  Endorsing,  supporting,  and  defrading  organizational  objectives. 

There  are  conceptual,  and  at  least  limited,  empirical  arguments  that  suggest  contextual 
performance  is  related  to  organizaticMud  effectiveness.  That  is,  organizations  with  individuals 
who  are  effective  in  the  contextual  performance  domain  will  tend  to  be  more  effective  than 
organizations  whose  incumboits  are  ineffective  in  this  domain  (Borman  &  Motowidlo,  1993). 
Therefore,  contextual  performance  can  be  seat  as  important  for  successful  organizational 
functicMung. 

In  fact,  a  recoit  study  by  Motowidlo  and  Van  Scotter  (in  press)  sheds  light  tm  the 
relaticmships  between  task  pe^ormanoe  and  contextual  performance  in  an  Air  Force  enlisted 
sample.  Motowidlo  and  Van  Scotter  found  that  both  ta^  and  omtextual  perfcmnance  rdated 
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substantially  to  overall  performance  (correlations  of  .46  and  .41  respectively);'  however,  the 
two  types  of  performance  correlated  only  .17  with  each  other.  Thus,  this  study  provides  mcne 
evidrace  for  the  importance  of  contextual  performance  in  ccmtiibuting  to  overall  effectiveness, 
and  the  low  correlation  with  task  performance  indicates  that  contextual  performance  cannot  be 
"covered"  by  measuring  technical  proficiency  only. 

Our  point  here  is  that  PC  is  measuring  the  task  performance,  technical  proficiency 
criterion  domain  and  is  not  tapping  the  cmitextual  performance  domain.  Paialld  to  the  maximal- 
typical  performance  discussion,  this  means  that  we  must  acknowledge  that  PC  measures  by 
diemselves  will  be  deficient  as  criteria.  Also  parallel  to  the  previous  discussion,  ability 
predictors  are  most  likely  to  be  correlated  with  task  performance,  arguing  again  for  including 
ASVAB  variables  in  the  PC  research  program. 

Another  study  used  the  Air  Force  JPMS  database  to  investigate  the  link  betwe^  ability 
and  technical  proficiency.  Alley  and  Teachout  (1990)  examined  the  joint  effects  of  aptitude  and 
experirace  on  job  performance.  Job  performance  was  measured  using  the  hands-on  woric  sample 
tests.  Aptitude  was  measured  using  scores  on  the  ASVAB.  Experience  was  defined  for  each 
job  incumbent  as  their  total  active  federal  military  service  at  the  time  of  testing.  Regression 
analyses  were  employed  to  examine  the  unique  contributions  of  aptitude  and  experience  to  the 
prediction  of  job  performance,  and  to  assess  the  degree  of  interaction  between  the  two  variables. 

Alley  and  Teachout  (1990)  found  that  both  aptitude  and  experience  made  sq)arale  and 
unique  contributions  to  prediction  of  performance.  Persons  with  highm*  sq>titudes  performed 
bettn  than  those  with  lower  ^titude,  while  individuals  with  more  experirace  performed  better 
than  those  with  less  experioice.  Interaction  effects  were  significant  in  only  one  of  the  dght 
AFSs.  In  addition,  curvilinear  effects  were  discovered  with  experience  in  one  AFS,  with 
aptitude  in  three  AFSs,  and  with  both  aptitude  and  experience  in  one  AFS.  As  noted  by  the 
authors,  the  absrace  of  significant  interaction  effects  is  consistrat  with  the  previous  work  of 
Schmidt  et  al.  (1986,  1988).  An  important  message  is  that,  again,  ability  and  £q)titude  are 
substantially  related  to  technical  proficiency  performance  criteria. 

Models  of  Job  Performance 

To  continue  the  discussion  of  PC  in  the  context  of  other  criterion  concepts,  recall  that 
we  discussed  performance  models  that  included  ability,  job  knowledge,  technical  profidracy, 
and  ratings  (Hunter,  1983;  Schmidt  et  al.,  1986).  Alw  described  was  an  expanded  model  that 
added  personality  and  selected  bdiavioral  variables  (Borman  et  al.,  1991).  Results  of  these 
efforts  were  us^  to  make  inferences  about  the  likely  antecedents  of  maximal  and  typical 
performance. 


'  Importantly,  Motowidlo  and  Van  Scotter  had  different  raters  evaluating  task,  ccmtextual, 
and  overall  performance. 
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In  this  section,  we  describe  an  attempt  to  identify  latent  performance  ccmstnicts  in 
commcHi  across  all  first  tour  U.S.  Army  soldiers  (Campbdl,  McHenry,  &  Wise,  1990),  and  evm 
more  ambitiously,  an  attempt  to  explicate  a  performance  model  to  fit  all  jobs  in  the  Dictionary 
of  Occupatitmal  Titles  (Campbell,  McQoy,  Oppler,  &  Sager,  1993).  We  do  this  in  order  to 
provide  more  p»:q)ective  on  what  parts  of  the  criterion  space  PC  is  likely  to  measure  and  what 
parts  are  likely  not  covered. 

Campbell  et  al.  (1990)  developed  lat^t  variable  models  to  explore  the  structure  of  job 
performance  for  first  tour  soldiers  in  the  U.S.  Army.  Confirmatory  factor  analyses  of  correla- 
ti(His  between  a  wide  array  of  performance  measures  suggested  a  5-factor  model.  Those  &ctors 
are: 


1.  Core  Technical  Proficiency 

2.  G^eral  Soldiering  Proficiency 

3.  Effort  and  Leadership 

4.  Personal  Discipline 

5.  Military  Appearance  and  Physical  Fimess 

The  first  two  factors  are  dominated  by  hands-on  and  job  knowledge  test  scores.  Factors 
3  through  5  are  defined  largely  by  performance  ratings  in  these  criterion  domains  and  certain 
administrative  measures  such  as  number  of  awards/commendations  and  number  of  disciplinary 
problems.  It  appears  that  PC  is  closely  aligned  with  the  first  two  ^tors.  However,  the  more 
volitional,  extra-technical  profki^cy  factors  are  not  covered  by  the  PC  concq)t. 

More  recently,  Campbell  et  al.  (1993)  introduced  an  8-factor  model  that  purports  to 
reflect,  at  the  most  general  level,  all  performance  requirements  for  all  jobs.  Not  all  of  the 
factors  are  relevant  for  all  jobs,  but  the  taxonomy  is  meant  to  be  exhaustive  of  all  possible 
performance  requirements.  The  factors  are  listed  below: 

1.  Job-Specific  Task  Proficiaicy 

2.  Non-Job-Specific  Task  Proficiency 

3.  Written  and  Oral  Communication  Task  Proficiency 

4.  Demonstrating  Effort 

5.  Maintaining  Persrmal  Discipline 

6.  Facilitating  Vea  and  Team  Performance 

7.  Supovisor/Leadership 

8.  Managemoit/Administration 

Again,  we  see  that  PC  is  likely  to  cova  only  some  of  this  "population"  of  dim^sions. 
PC  seems  relevant  for  Factors  1  through  3  but  not  so  relevant  for  Factors  4  through  8. 
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Review  of  Selected  Tune  Perception  and  Estimation  Literature 

Outsicte  of  the  work  d(Hie  with  time-based  performance  measuiemeit  by  the  PC  research 
program,  few  studies  can  be  found  that  provide  useful  information  to  support  or  argue  against 
such  a  system.  A  brief  review  of  the  time  perception  and  time  estimation  literature  offers  some 
tang^dal,  but  rdevant  research  findings  to  consider  for  further  PC  study. 

Two  opposing  theories  of  time  perception  are  discussed  in  the  literature,  one  by  Qmstein 
(1970)  and  one  by  Priestly  (1968).  Omstein’s  theory  is  based  on  memory  storage  size  and 
asserts  that  time  estimates  are  a  function  of  die  amount  of  effort  expended  for  information 
processing.  Thus,  the  more  complex  the  stimulus  is,  the  more  processing  it  will  require,  and 
the  time  period  in  which  processing  occurs  will  seem  longer.  Priestly’s  theory,  on  the  otha* 
hand,  is  based  on  the  adage  "time  flies  when  you’re  having  fun,"  meaning  that  the  more  actively 
one  must  process  stimuli,  the  faster  time  seems  to  pass.  Thus,  time  estimates  according  to 
Priestly  are  an  inverse  function  of  stimulus  complexity  because  the  higher  information  processing 
requirem^ts  demand  the  cognitive  attrition  that  would  otherwise  be  devoted  to  time  estimation. 
Omstein’s  theory  would  seem  to  support  the  findings  of  I&rville  and  Skinner  (1993)  that  time 
estimates  were  more  accurate  for  shorter  duration  tasks  which,  presumably,  present  less  complex 
stimuli. 

Hartley,  Brecht,  Pagerey,  Weeks,  Chapanis  and  Hoecko’  (1977)  focused  on  self-rqwrts 
by  workers  of  task  identification,  rank  ordering  of  tasks  according  to  time  ^nt,  and  actual  time 
estimation  per  task.  Results  showed  that  the  accuracy  of  self-report  time  estimates  decreased 
with  the  increase  in  measuremait  scale  complexity.  In  other  words,  the  time  estimation 
accuracy  suffered  as  responses  progressed  from  nominal  (task  identification)  to  ordinal  (rank 
ordering)  to  ratio  (time  estimatirm).  Since  accuracy  was  poorest  at  the  ratio  level  whm  subjects 
were  asked  for  specific  time  estimates,  the  authors  suggested  that  objective  observers  be  used 
in  situations  requiring  precise  task  time  measuremrat.  The  Hartley  et  al.  (1977)  study  also 
supports  the  short  duration  task  accuracy  claims  of  Harville  and  Skinner  (1993),  but  they  suggest 
using  actual  timed  performance  whoiever  possible. 

Carroll  and  Taylor  (1969)  compared  the  time  allocation  estimates  of  clerical  workers  with 
time  measuremrats  obtained  through  unobtrusive  work  sampling  over  a  two-week  period.  They 
found  that  the  average  corrdation  between  estimated  and  actual  time  allocations  was  .88,  a 
finding  supportive  of  the  notion  that  at  least  gross  estimates  of  relative  time  ^)ait  are  accurate. 
These  authors  maintained  that  low-level  employees  can  easily  and  accurately  provide  time 
estimates  to  save  as  a  general  guide  to  the  type  and  frequency  of  tasks  performed  on  various 
jobs,  which  would  be  useful  for  numerous  persoimel  functions. 

Alcmg  these  same  lines,  Christal  and  Weismuller  (1988)  recommended  that  job 
descriptions  be  based  on  the  percratage  of  work  time  that  workers  spend  on  each  task.  They 
note,  however,  that  personal  experience  indicates  that  many  workers  do  not  have  a  clear  idea 
of  the  exact  percentage  of  time  devoted  to  each  task  that  they  perform.  This  experi^ce  is 
corroborated  by  other  research  (e.g.,  Carpenta,  Giorgia,  &  McFarland,  1975;  Wilscni  & 
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Harvey,  1990).  As  a  result,  they  oicourage  the  use  of  a  "relative  time  spent"  scale,  believing 
that  people  can  state  with  confidmce  that  they  spend  more  time  on  one  t^  than  on  another. 

Turney  and  Cdien  (1978)  examined  the  estimation  of  effort  expoided  by  asking 
informaticm  processing  personnel  to  rate  the  duration  of  task  performance.  De^ite  this 
somewhat  different  orientati<Mi,  their  dq)aident  measures  were  similar  to  PC  work,  namely 
percdved  time  and  monitored  time.  The  form^  was  obtained  through  ratings  of  effort  and  time 
spent  on  three  ta^  by  Army  personnel  and  the  latter  was  measured  directly  by  a  computer  over 
sevra  weeks.  The  perceived  and  monitored  time  measurem^ts  were  highly  consist^!,  and 
effort  and  time  were  significantly  and  positively  correlated.  However,  the  strength  of  the 
relationship  varied  across  tasks  and  time  measurement  sources. 

Another  study  (Troutwine,  1984)  examined  the  related  issue  of  perceived  task  quality  and 
time  estimates.  Subjects  in  this  study  listraed  to  two  audio  t£q)es,  one  containing  a  boring  ethics 
passage  and  the  other  a  tale  of  mythological  adventure,  and  then  rated  the  t^)es  according  to 
how  "intmesting"  and  "pleasant"  each  was.  They  also  estimated  the  time  interval  for  each  tape. 
Troutwine  found  that  as  the  tape’s  favorabihty  rating  increased,  the  estimated  duiatimi 
decreased. 

Fredeiickson  (1988)  examined  a  theory  of  temporal  experience  that  states  that  experience 
of  time  is  an  interrelation  of  two  compmiaits,  succession  and  duration.  Succession  is  also 
known  as  temporal  orientation;  one  can  have  a  future,  preset,  or  past  orientation  to  time. 
Duration  is  also  called  temporal  pace,  which  is  the  amount  of  time  elapsed  in  an  interval. 
Frederickson  studied  the  impact  of  temporal  orientation  on  temporal  pace  by  analyzing  the  v^ 
toise  used  by  clients  in  a  40  minute  thers^y  session  (orioitation  measure)  and  obtaining  a  written 
estimate  from  each  cli^t  of  the  time  eUq)s^  during  the  session  (pace  measure).  Results  showed 
that  those  climts  with  a  "past"  orientation  had  significantly  longer  time  estimates  of  the  session 
(slow  pace)  than  did  clirats  with  a.present  or  future  orioitation.  Frederickson  (1988)  suggested 
that  both  components  be  assessed  simultaneously  whoi  considering  the  influence  of  temporal 
experience  on  human  behavior. 

Hogan  (1978)  proposed  a  time  estimaticm  theory  that  considers  the  combined 
contributions  of  stimulus  complexity  with  the  personality  dimension  of  extroversion.  He 
explained  that  extroverts  have  a  higher  stimulation  baseline  than  introverts  and  thus  will  perceive 
a  simple,  "boring"  stimulus  as  taking  up  a  long^  time  period  than  an  introv^  would  estimate. 
Hogan  argued  that  Omstein  and  Priestly  are  both  right,  but  that  the  true  rdationship  betwera 
stimulus  complexity  and  time  estimation  is  not  linear  but  curvilinear.  When  considering 
individual  information  processing  styles  in  the  form  of  extroversion,  an  inverted  U-shape  best 
represrats  the  predictive  relationship.  Specifically,  there  is  an  optimal  moderate  amount  of 
stimulus  complerity  (that  is  higher  for  extroverts  than  introverts)  above  or  below  which 
information  processing  slows  and  estimates  of  time  duration  increase.  Hogan  suggested  that 
whethCT  (Hie  is  understimulated  or  overstimulated,  mther  state  is  boring  and  thus  percdved  to 
be  lengthy. 
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While  Hogan  provided  no  empirical  evidence,  Zakay,  Lomranz  and  Kaziniz  (1984)  tested 
his  theory,  and  found  support  for  Hogan’s  hypotheses.  However,  stimuli  used  in  this  study  were 
slides  of  geometrical  figures  and  complexity  was  determined  by  the  number  of  internal  angles 
in  the  rigure.  Also,  time  estimates  all  averaged  under  10  seconds,  so  applicability  of  the  Zakay 
et  al.  (1984)  findings  to  productive  capacity  time  estimates  is  trauous. 

These  time  estimation  and  perception  studies  provide  a  variety  of  interesting,  and 
tangentially  related  information.  Collectively,  these  studies  suggest  the  need  for  rigor  in  PC 
research  d^gn  because  so  many  content  and  context  variables  may  influmce  the  accuracy  of 
time  estimation. 

Human  Leaming/Skill  Acquisition,  Retention,  Decay/Leaming  Curve  Hieory 

The  human  learning  area,  with  its  voluminous  body  of  literature,  is  well  beyond  the  scope 
of  this  paper.  Much  has  been  written  about  ev^  small  subsets  of  the  human  learning  domain. 
Still,  a  few  general  comments  and  reflections  about  these  issues  and  their  relation  to  the  PC  re¬ 
search  program  are  appropriate  here. 

People  differ  crmsiderably  in  the  ridlls  they  achieve  for  complex  tasks,  evoi  after 
extensive  training.  In  addition,  prolonged  practice  or  experience  with  some  tasks  may  even 
increase  individual  differmces  (Heishman  &  Mumford,  1989).  Cognitive  psychology  and 
information  processing  approaches  to  learning  discuss  the  link  between  individual  differences  and 
task  performance.  Fitts  and  Posner  (1967)  noted  three  phases  of  skilled  performance  during  skill 
acquisition:  cognitive  (declarative  knowledge),  associative  (knowledge  compilation),  and 
autonomous  (procedural  knowledge).  Shiffrin  and  Schneider  (1977)  argued  that  not  all  tasks 
allow  skill  acquisition  along  the  stages  described  by  Fitts  and  Posn^.  Rather,  tasks  that  require 
learners  to  deal  with  novel  situations/demands  never  allow  them  to  progress  beyond  the  first  or 
second  stage.  Shiffrin  and  Schneider  (1977)  described  these  novel  or  inconsistrat  tasks  as 
requiring  "controlled"  information  processing,  while  rimple,  consistent  tasks  allow  "automatic" 
information  processing. 

Ackerman  (1986,  1987)  and  Ackerman  and  Humphreys  (1990)  proposed  that  three  major 
ability  classes  relate  to  individual  differences  at  these  three  stages  of  learning.  During  Phase  1 
general  intelligence  is  critical;  during  Phase  2  knowledge  compilation  involves  integration  of  the 
cognitive  and  motor  processes  required  for  performing  a  task,  while  Phase  3  occurs  whoi  the 
individual  has  automatized  the  skill.  These  phases  of  skill  acquisition  are  amsistoit  with  the 
notion  of  three  skill  acquisition  stages:  (1)  novice,  (2)  journeyman,  and  (3)  master. 

Unfortunately  these  lemming  stages  often  interact  with,  and  are  ariected  by,  the  military 
training/performance  environmrat.  By  its  very  nature,  the  military  oivironment  fosters/requires 
(1)  high  levels  of  planned  and  unplaimed  turnover,  (2)  wide  variation  in  task  contrat  and  task 
difficulty,  and  (3)  extreme  variability  of  initial  skills  and  ability  in  the  entrant  population  (Lane, 
1987). 
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The  regular  cycle  of  change  and  progression  within  the  military  envircximoit  means  that 
as  the  airman  begins  to  move  toward  the  "mastery”  stage,  and  even  the  "journeyman”  stage,  as 
often  as  not  he/she  begins  to  assume  supovisory  responsibilities,  and  the  frequency  and  recency 
of  technical  task  performance  may  be^  to  decline. 

N(M  much  is  known  about  how  an  individual  trainee’s  characteristics  ate  related  to  Itrng- 
term  retention  of  skills  (Fleishman  &.  Mumford,  1989).  However,  a  study  by  Fleishman  and 
Parker  (1962)  suggests  that  levels  of  retenticm/decay  were  explainable  in  terms  of  individual 
differences  among  subjects  in  the  habits  acquired  during  practice  of  the  original  task.  In  other 
words,  final  level  of  profidoicy  during  training  was  an  important  factor  in  level  of  profidaicy 
maintained. 

These  issues  of  skill  acquisition  and  decay  may  play  an  important  role  in  the  PC 
measurement  research.  Of  special  note  here  are  the  findings  by  Faneuff  (1993)  that  PC  initially 
increases  with  experience  until  it  reaches  a  maximum  and  then  begins  to  steadily  drop  off,  as 
shown  in  his  plotting  of  response  curves.  Faneuff  noted  that  the  decreasing  PC  with  increasing 
experioice  over  a  portitm  of  the  curves  and  surfiu:es  was  unexpected.  He  speculated  that  there 
might  be  some  point  in  an  airman’s  career  where  he  or  she  may  begin  to  experioice  skill 
degradation. 

Throughout  the  lit^ture  there  are  regularities  in  the  general  form  of  learning  and 
acquisition  curves.  As  noted  by  Lane  (1987),  tte  negatively  acceloated  curve  is  not  only 
"typical"  of  group  performance,  but  it  is  found  in  most  real-world  training  situatitms.  bi 
additicm,  the  general  shape  of  the  learning  curve  is  consistent  with  all  major  theoretical 
explanations  of  how  skill  acquisition  proceeds.  However,  Lane  also  noted  that  while  the 
negatively  accderated  shape  is  quite  common,  parameters  of  the  curves  trad  to  be  situadcm  and 
task  dq)radrat.  Thus,  the  particular  curve  "fsimily"  providing  the  best  fit  to  a  given  set  of  data 
is  likely  to  vary  as  a  function  of  the  task,  its  components  and  difficulty  level,  the  charactaistics 
of  the  people  performing  the  task,  the  length  of  the  practice  (as  well  as  the  frequency  and 
recracy  of  performance),  the  way  performance  is  measured,  and  the  training  method  us^. 

What  does  all  of  this  mean  for  the  productive  c^;>acity  research  project?  Praformance 
is  a  reflection  of  these  acquisition,  retention,  and  decay  patterns.  Performance  is  also  linked  to 
aptitude,  ability,  and  task  characteristics.  Consequrady,  it  becomes  crucial  for  PC  research  to 
g^er  as  much  data  ipedfically  related  to  these  variables  as  possible. 


IV.  PERSPECTIVES  ON  PC  AFFUCAHONS 

We  see  two  rdated  applications  stimulating  future  research  on  PC:  d^ermining 
manpower  requiremrats  and  setting  standards  to  achieve  those  requiremrats.  Manpower  and 
perscmnel  are  two  separate  functiras  in  the  Air  Force.  Manpower  planners  determine  the 
numbra  and  skill  levels  of  positions  required  to  praform  specific  jobs  at  locatiras  worldwide. 
Personnd  q)ecialists  select,  classify,  and  allocate  pecple  to  the  manpower  positions. 
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What  is  missing  in  this  pnKess  is  attention  to  the  broad  range  of  capabilities  possessed 
by  individuals.  We  believe  PC,  with  its  interactive  effects  of  aptitude  and  experioice,  can 
contribute  to  better  management  of  manpower  and  persoimel  functions  within  the  Air  Force. 

Manpower  Requirements 

The  Logistics  Cor"X)site  Model  (LCOM),  created  in  the  late  1960s,  is  the  accepted  Air 
Force  manpower  model  (boyle,  1990).  This  modd  is  a  policy  analysis  tool  which  relates  base- 
level  logistics  resources  with  each  other  and  with  sortie  goieration  capacity.  The  logistics 
resources  modded  in  LCOM  include  maintmance  persoimd,  spare  parts,  and  aerospace  ground 
equipment.  When  using  this  noodd,  the  spare  parts  are  ccmstiained,  thra  manpower  constraining 
is  performed.  With  the  objective  of  maximizing  sortie  generation  potential,  LCOM  is  used  to 
prevent  manpower  staffing  from  being  too  high  (i.e.,  idle  or  underutilized)  or  too  low  (i.e.,  too 
busy  or  overutilized).  The  manpower  for  eadt  AFS  is  optimally  constrained,  when  adding 
manpower  does  not  affect  sortie  rate,  and  reducing  manpower  would  drop  the  sortie  rate  bdow 
the  desired  levd. 

Due  to  the  l^gthy  process  involved  in  determining  LCOM  manpow^  estimates,  the 
Queuing  Manpower  Model  (QMAN)  was  devdoped  (Grobman,  Quick,  &  Weaver,  1994).  This 
modd  applies  a  queuing  algorithm  to  AFS/crew  size  clusters  to  determine  the  manpower 
necessary  to  med  flying  demands.  This  value  is  thdi  compared  to  utilization  and  crew  size 
effects  to  determine  the  actual  manpower  requirements.  QMAN  provides  r^id  manpower 
estimadcms  that  correlate  well  with  the  estinuUes  from  LCOM.  These  faster  estimations  give 
QMAN  capabilities,  such  deternuning  how  specialty  structuring  affects  manpower 
requiremdits,  that  LCOM  does  not  have.  Also,  QMAN  has  the  PC  relevant  capacity  to 
demonstrate  that  as  the  task  performance  times  of  aircraft  maintainers  decrease,  the  number  of 
aircraft  maintainers  that  are  necessary  to  support  flying  decreases. 

With  models  such  as  QMAN  available,  time-based,  quantitative  measures  would  be  useful 
to  Air  Force  manpower  plaimers.  Curiditly,  the  aggregate  of  all  manpower  positions,  tempded 
by  total  mission  objectives  for  a  given  year,  forms  the  basis  for  budget  submissicms  to  sustain 
the  objective  force.  Congressional  authorizations  in  turn  q)ecify  the  number  of  people  the  Air 
Force  can  expect  to  till  its  manpowd  requiremdits.  Requested  manpower  and  authorized  did 
stidigth  seldom  match  precisdy. 

Once  did  strength  is  set,  personnel  planners  detcnnine  the  number  of  accessions  required 
to  till  projected  vacancies.  They  also  determine  tiie  training  requirements  for  new  people.  Air 
Force  sqiplicants  are  selected  for  service  based  di  measures  of  in^vidual  diffeidices  in  aptitudes, 
assuming  that  some  minimum  degree  of  mental  quality  is  necessary  to  successfully  accomplish 
ditry  level  jobs.  This  process  also  implicitly  assumes  that,  once  trained,  airmen  wiU  achieve 
desired  proficiency  sometime  before  expiration  of  their  enlistment  and  that  performance  will 
increase  with  experidice  to  prqiare  them  for  highd  skill  activities.  However,  newly  trained 
airmdi  may  be  able  to  meet  job  quality  requirements,  but  the  speed  or  quantity  of  their 
pdformance  (hi  the  job  may  differ  substantially  from  the  expected  level. 
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PC  has  potoitial  to  enhance  manpow^  planning  by  considering  tradeoffs  in  amount  of 
work  expected  in  given  periods  of  time  by  people  who  have  different  qualifications.  For  entry- 
level  manpower  planning,  PC  off(^  trarteoffs  between  the  number  of  people  required  and  the 
quality  of  people  (fewer  high-quality  people  or  more  low-quality  people).  Such  tradeoffs  could 
be  worked  into  the  endstroigft  plan  and  made  into  recruiting  objectives.  Personnel  planners 
could  prioritize  personnel  quality  goals  according  to  importance  of  jobs,  or  job  locations,  for 
specific  mission  needs. 

For  manpow^  planning  to  sustain  a  specified  force,  experioice  could  be  factored  in,  to 
allow  tradeoffs  with  the  number  of  people  required  (few^  people  who  have  extoisive  e}q)mirace 
or  more  people  who  have  limited  experience).  Personnel  planners  in  this  context  could  establish 
cross-training  or  reenlistment  objectives  to  achieve  the  desired  force  capability. 

If  research  demonstrates  consistent  relationships  of  PC  with  aptitude  and  experience  in 
a  variety  of  job  :q>ecialties,  the  ratio  properties  of  PC  could  form  the  basis  for  quantity/quality 
and  quantity/experience  tradeoffs  discussed  above.  Conceivably,  two  airmen  wi^  PC  =  .5  can 
produce  the  same  amount  of  work  as  one  airman  with  PC  =  1.0  or  four  airmoi  with  PC  =  .25. 
By  ccMisidering  the  design  of  specific  job  tasks  (independent  or  team  performance,  backup 
requirements,  etc.),  recruiting  budget  (realistic  quality  goals),  and  radstrength  limitations, 
platmers  could  develop  manpower  requirements  to  maximize  expected  performance.  Research 
is  needed  to  develop  models  that  will  take  all  of  these  factors  into  consideration. 

As  an  adjunct  to  the  manpower  planning  process,  redesign  of  individual  jobs  themselves 
might  be  enhanced  by  study  of  PC.  Because  PC  measurement  addresses  time  to  perform  :q)ecific 
tasks,  it  may  be  possible  to  group  tasks  in  more  logical  ways  to  enhance  performance. 
Regrouping  tasks  may  then  suggest  more  logical  organization  of  jobs  themselves.  Also, 
consideration  of  different  requirements  for  jobs  under  peacetime  or  wartime  conditions  may 
affect  job  design. 


Standards  Setting 

The  perspectives  (m  PC  applicaticm  for  determining  manpower  requiremmts  are 
consistent  with  the  direction  of  past  research  and  point  to  ways  to  fill  gaps  in  personnel 
managemoit  within  the  Air  Force.  Likewise,  job  selection  standards  based  on  prediction  of  PC 
can  potoitially  aid  in  selecting  peq>le  capable  of  performing  at  desired  levels  within  time 
objectives.  Much  of  today’s  standards  setting  process  is  judgmoital  —  commanders,  trainers, 
and  functional  managers  assess  the  edibilities  of  assigned  persoimel  and  adjust  standards  to 
modify  capabilities.  PC  offers  a  more  sophisticated,  empirically  based  source  of  informatitHi 
to  help  make  these  judgmrats.  Although  these  directions  appear  fruitful,  there  are  many 
q)ecifics  to  be  developed.  The  discussion  ha«  is  intended  to  lay  out  curroit  thinking  and  draw 
uptm  previous  studies  that  may  support  future  PC  research.  In  this  section  we  first  define  the 
problems  that  standards  setting  addresses.  Second,  the  major  methods  of  setting  standards  are 
described.  And  third,  we  speculate  on  how  this  concept  may  be  relevant  to  PC-based  standards. 
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Performance  is  typically  measured  on  a  continuum  (e.g.,  from  very  effective  to  very 
ineffective  performance).  Furthermore,  most  researchers  concerned  with  performance 
measurement  would  likely  judge  such  measures  to  be  on  an  ordinal  scale,  with  performance 
scores  comparable  only  in  terms  of  rank  order  (for  example,  many  rating  scales  use  terms  like 
"outstanding"  or  "average"  which  do  not  convey  information  about  absolute  amount  of 
performance  or  amount  of  performance  more  than  the  next  lower  level).  It  is  possible  that  with 
some  performance  measures,  interval  scale  assumptions  may  be  defensible.  For  example,  with 
hands-on,  work  sample  performance  tests,  the  percent  correct  scores  may  be  argued  to  possess 
interval  properties  (i.e.,  the  number  of  units  b^eoi  two  levels  of  work  can  be  determined). 
Whether  ordinal  or  interval  assumpticHis  hold,  continuous  scales  for  indexing  performance  are 
useful,  as  long  as  we  are  simply  comparing  performance  levels  among  incumbrats. 

There  are  applications,  however,  where  we  would  like  to  measure  performance 
dichotomously.  An  important  practical  question  to  ask  in  some  situations  may  be,  "Is  this 
incumboit  qualified  or  not?"  or  "Is  he  or  she  a  sadsfiu^tory  performer  or  performing  at  less  than 
an  acceptable  level?"  Dichotomous  performance  measurement  may  be  useful  in  at  least  three 
human  resource  2q)plications.  First,  the  identification  of  training  needs  can  benefit  from  knowing 
whether  individual  incumbents  are  performing  satisfactorily  or  unsads&ctorily.  A  train/do  not 
train  decision  for  an  incumbent  is  simplified  if  we  know  that  the  person  is  qualified  or 
unqualified  in  a  particular  aspect  of  the  job. 

Second,  the  effect  of  other  personnel  programs  or  interventions  can  be  meaningfully 
evaluated  and  relatively  easily  explained  to  managers  by  comparing  the  numbers  or  percratages 
of  incumbents  performing  satisfactorily  after  the  program  or  intervention  with  the  numbers  or 
percentages  performing  at  a  satisfactory  level  before  the  program  or  intervention.  Third,  goal 
setting  is  most  effective  when  ^)ecific,  non-ambiguous  goals  are  set  (e.g.,  Locke,  Shaw,  Saari, 
&  Latham,  1981),  and  therefore  a  goal  of  becoming  qualified  cm  a  task  or  job,  where  the 
standards  for  qualification  or  satisfactory  performance  are  wdl  specified,  is  likely  to  provide  the 
desired  effort  and  motivation  outcomes. 

Accordingly,  for  these  and  related  human  resources  sq^plications,  standards  setting,  with 
the  accompanying  capability  to  measure  performance  dichotomously,  satisfactory  or 
unsatisfactory,  qualified  or  not,  has  some  definite  advantages.  We  now  briefly  review  m^ods 
for  establishing  such  standards.  Advantages  and  disadvantages  of  each  method  using  PC  are  also 
noted. 

Broadly  speaking,  there  are  two  primary  tqrproaches  for  setting  performaiu^e  standards 
in  the  context  of  mental  abilities/aptitude  testing:  item-based  and  examinee-based  methods. 
Item-based  methods  focus  on  the  performance  test  and  evaluate  how  a  minimally  competoit 
person  is  likely  to  score  on  the  test.  Examinee-based  methods  start  with  idoitifying  competent 
and  incompetent  incumbents  and  then  dmve  a  performance  test  cutoff  score  separating  the  two 
groups. 
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Item-Based  Methods.  Nedelsky  (1954)  espoused  a  method  for  determining  minimum 
acceptable  scores  (ui  multiple  choice  paper-and-pencil  tests.  His  £^proach  involves  having  each 
job  expert  (experienced  j(^  incumbent  or  well-qualified  supervisor)  review  each  test  item  and 
identify  le^nse  alternatives  that  the  minimally  competent  incumbent  would  not  pick  as  a 
correct  answer.  Then,  for  that  expert,  the  reciprocals  of  the  number  of  response  altnnatives  for 
each  item  not  so  identified  are  added  to  obtain  a  minimum  passing  score.  Thus,  for  a 
hypothetical  2-item  test,  with  4  response  opticxis  for  each  item,  suppose  an  expert  identified  2 
response  options  with  each  item  that  the  minimally  competent  examinee  would  Tiot  pick  as 
correct.  The  minimum  acceptable  score  then  would  be  1/2  -f  1/2  =  1.  The  actual  cutoff  score 
for  a  test  is  established  by  averaging  these  scores  across  sev^al  experts. 

The  Angoff  (1971)  method  requires  a  somewhat  different  judgment  on  the  part  of  job 
experts.  Each  expert  is  to  consider  a  group  of  minimally  competent  workers  and  estimate  the 
percent  of  this  group  that  would  get  each  item  on  the  test  correct.  These  percentages  are  thra 
averaged  across  all  items  on  the  test  and,  as  desired,  across  experts. 

Ebel’s  (1972)  strategy  for  determining  a  minimally  acceptable  score  on  a  test  is  quite  a 
bit  more  complex.  To  employ  this  method,  a  2-dimension^  matrix  of  item  difficulty  (e.g. ,  easy, 
medium,  hard)  by  relevance  of  item  for  successful  performance  (e.g.,  essential,  important, 
questionable)  is  fint  constructed.  Then,  each  jd)  expert  sorts  each  test  item  into  one  of  the  cells 
and  answers  the  question  for  each  cell,  ”If  a  borderline  examinee  were  to  respond  to  items  like 
these,  what  percentage  of  the  items  would  be  answered  correctly?”  These  percent  estimates  are 
subsequoitly  averaged,  weighted  by  the  number  of  items  in  each  cell.  Again,  typically,  results 
from  several  experts  are  averaged. 

Jaeger’s  (1982)  method  is  more  straightforward.  Job  experts  are  asked  to  review  each 
test  item  and  answer  the  question,  "Should  evie/y  examinee  who  is  at  least  minimally  competent 
answer  this  question  correctly?"  The  number  of  items  for  which  the  answer  is  "yes"  then 
becomes  the  minimally  ac^table  score,  and  these  scores  are  usually  averaged  ovct  exp^. 

Unfortunately,  each  of  these  methods  has  substantial  problems  with  respect  to  its 
relevance  to  PC.  There  are  many  criticisms  of  Nedelsky’s  (1954)  method,  including  low 
OHifidence  levels  reported  by  experts  making  the  judgments  (Poggio,  1984)  and  unrealistic 
assumptions  about  the  likely  judgmoit  process  experts  employ  in  identifying  response  options 
(Jaeger  &  McNulty,  1986).  However,  the  overriding  praede^  problem  is  that  the  metht^  can 
only  be  used  for  multiple  choice  tests. 

The  Angoff  (1971)  strategy  is  more  promising  for  PC-based  standards  setting. 
Psychometric  properties  of  the  expert  judgment  resulting  fiom  the  method  are  reasonably  good 
(Nordni,  lipner,  &  Langdon,  1987).  Application  of  the  method  would,  however,  seem  to 
require  a  somewhat  different  judgment  task  on  the  part  of  the  job  experts.  Perh^s  «q)ats  could 
directly  estimate  the  time  to  completion  for  a  minimally  qualified  and  minimally  accq)table 
incumbent  (Hi  each  task.  Experts  might  be  aided  by  the  futest,  typical,  and  slowest  benchmark 
times  for  each  task,  provided  they  had  been  previously  estimated.  One  would  suppose  that  the 
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"correct"  estimate  for  this  judgment  would  be  somewhere  between  the  typical  and  slowest  times. 
Similar  to  what  was  done  for  the  benchmark  estimates,  interrater  agreement  betweoi  experts 
might  be  used  to  evaluate  the  quality  of  the  minimally  qualified  time  estimates. 

Ebel’s  (1972)  method  does  not  appear  2q)plicable  to  PC  measuremrat.  Tasks  might  be 
treated  as  items  and  sorted  into  a  difficulty  by  relevance  matrix,  but  asking  experts  to  estimate 
the  percent  of  tasks  minimally  competent  incumbents  could  get  "correct"  (e.g.,  less  than  the 
normal  time?)  does  not  seem  very  useful. 

Finally,  Jaeger’s  (1982)  2q)proach  could  pefiiaps  be  adapted  to  develop  minimum 
accq)table  time  estimates  for  ta^.  Instead  of  asking  job  e]q)erts  to  make  direct  time  estimates 
for  minimal  competrace  rni  each  task  (as  was  suggested  previously),  they  might  be  instructed 
to  identify  for  a  task  the  longest  time  to  completion  that  "every  examinee  who  is  at  least 
minimally  competrat"  would  be  allowed. 

Examinee-Based  Methods.  Zieky  and  Livingston  (1977)  developed  what  they  called  the 
borderline-group  procedure.  With  this  method,  job  experts  knowledgeable  about  the 
performance  of  several  incumbents  are  asked  to  sort  these  incumbents  into  three  groups  ~ 
competent,  borderline,  and  incompetent.  The  median  test  score  for  the  borderline  group  is  thoi 
incorporated  as  the  minimally  acceptable  score. 

The  contrasting-group  method  is  similar  to  the  borderline-group  procedure.  Livingston 
and  2^eky  (1982)  suggested  asking  job  experts  to  idoitiiy  incumbrats  who  are  definitely 
qualified  and  incumbrats  who  are  definitely  not  qualified.  Then,  performance  test  scores  for 
the  two  groups  can  be  plotted,  and  the  minimally  acceptable  score  is  set  at  the  point  where  the 
two  distributions  of  scores  intersect. 

Both  advantages  and  disadvantages  of  these  two  methods  have  been  discussed  in  the 
literature.  Example  advantages  are  the  rdative  nmplicity  of  the  procedures  and  the  fact  that 
they  can  be  used  with  any  kind  of  performance  test  or  rating  (see  Poggio,  1984).  Disadvantages 
include  the  requirement  that  the  sample  of  job  incumbents  used  in  these  procedures  must  be 
representative  of  the  target  populaticm  of  incumbents  and,  with  the  borderline-group  method, 
identificaticm  of  the  borderline  group  (usually  a  very  small  number)  may  be  difficult  (Ja^^  & 
McNulty,  1986). 

With  respect  to  PC-based  standards,  it  is  possible  that  these  examinee-based  methods 
could  be  used  to  help  evaluate  the  validity  of  job  expert  minimally  acceptable  time  estimates, 
for  at  least  a  few  ta^.  Aft^  a  minimal  competency  time  has  b^  estimated  for  a  task,  job 
incumbents  might  be  identified  as  competoit  or  incompetrat  (and  p^tu^s  borderline),  and  actual 
time  to  completion  scores  for  hands-on  performance  on  the  task  assessed  for  each  incumbent. 
Using  the  cmitrasting-group  method,  for  example,  the  intersection  of  the  time  to  complete  score 
distributions  could  thoi  be  noted,  and  that  time  compared  to  the  time  estimate  made  using  the 
revised  Angoff  or  Ja^er  methods.  Similar  to  Skiima,  Faneuff,  and  Demetriades  (1991), 
correlaticms  betwera  the  two  sets  of  estimates  might  be  computed,  provided  that  a  reasonably 


25 


large  numb^  of  tasks  were  included.  This  conelaticHi  would  then  serve  as  one  index  of  the 
validity  of  the  minimally  acceptable  time  estimates. 

Our  brief  review  of  the  literature  suggests  several  ideas  applicable  to  PC-based  standards 
setting.  Both  the  item-based  and  examinee-based  s^roaches  to  setting  performance  standards 
offer  methodologies  that  could  prove  useful.  Hdier  a  modified  version  of  the  Angoff  or  Jaeger 
s^roaches  to  item-based  standards  setting  could  be  adapted  for  PC  measurements.  The  Angoff 
method  would  require  experts  to  directly  estimate  time  to  completion  per  task  for  a  minimally 
qualified  incumboit.  Jaeger’s  approach  would  focus  (mi  having  e^rts  idoitify  for  a  task  the 
longest  time  to  completion  allowable  for  a  minimally  competent  examinee.  The  examine-based 
methods  may  be  more  useful  for  validating  the  item-based  procedures  than  for  setting 
performance  standards.  Correlations  between  contrasting-group  (or  borderline-group)  scores  and 
scores  derived  fiom  item-based  estimates  could  be  used  as  one  index  of  diese  time  estimates. 


V.  IMPU 


NTATION  AND  FUTURE  PC  RESEARCH 


Earlier  we  raised  a  few  issues  relevant  to  the  application  of  PC  research  to  establidiing 
manpower  requiremoits  and  setting  standards.  We  Aoi  discussed  how  some  of  the  current 
standards  setting  methods  relate  to  PC.  Here  we  return  to  the  broads  framework  for 
considering  PC  within  the  context  of  setting  standards  to  maximize  military  output  for  a  given 
cost.  All  new  ideas  for  better  personnel  managemoit  ultimately  fiu^  the  test  of  cost 
effectivoiess  —  does  the  idea  provide  something  of  value  that  justifies  an  mvestmrat  in  it? 
Although  PC  research  is  £ur  fiom  generating  operational  systems,  the  research  should  proceed 
with  implementaticxi  in  mind.  In  this  section  we  will  turn  to  thoughts  devdoped  by  Black  (1988) 
as  he  OHisidered  the  DoD  Job  Performance  Measuremoit  Project  and  looked  ahead  to 
implemrating  performance-based  measures  for  setting  standards.  The  issues  he  raised  are 
equally  relevant  to  PC. 

Black’s  chief  concern  centered  on  the  impact  on  overall  military  capability  fiom  new 
standards  setting  s^roaches  that  seek  to  maximize  individual  performance.  If  an  objective  of 
new  standards  is  to  allow  tradeoffs  betweoi  greats  military  output  associated  with  higher-quality 
entrants  and  their  higher  cost  compared  with  lower-quality  entrants,  seddng  maximum  individud 
performance  alone  may  result  in  suboptimal  quality  solutions  for  the  military  ovmall.  Such 
objectives  tend  to  miss  the  impact  of  quality  (m  unit  or  group  ouq>ut,  which  is  more  closely 
aligned  with  military  capability.  Individual  performance  has  an  interactive  effect  on  group 
output  (rather  than  a  simple,  unadjusted  summation  of  indq)aidrat  efforts).  Black  raised  a 
numb^  of  issues  to  consider  in  research  on  performance-based  standards  setting.  In  summary, 
these  issues  are: 


The  military  personnel  systems  are  closed  systems.  An  individual  typically  aitm:s  at  a 
low  level  (unskilled  and  low  ranking)  and  the  Services  invest  hea\^y  in  training  and 
career  devdopment  as  the  person  progresses  upward.  Recruiting  and  training  programs 
are  ecpoisive.  Services  have  numerous  optims  when  it  comes  to  tradeoffs  amcmg 


26 


personnel,  manpower,  and  training,  all  seeking  the  best  return  for  the  investment. 
Ideally,  there  should  be  a  balance  betwe^  defimse  capability  and  cost. 

•  Job  proficiency  alone  does  not  sum  to  unit  capability.  As  noted  earlier,  there  are 
complex  interactions  among  people  and  job  requirements.  For  example,  within  work 
groups  the  performance  of  one  person  is  often  influoiced  by  the  ability  and  p^ormance 
of  others  in  the  group.  Also,  individual  contributions  are  partially  affected  by  the 
availability,  type,  and  sophistication  of  equipment  for  getting  the  job  done.  Tradeoffs 
are  possible  among  quality  of  personnel  and  sophistication  of  equipmrat. 

•  Not  all  jobs,  independent  of  how  wdl  they  are  p^ormed,  are  equally  valuable  to  overall 
mission  capability.  For  future  personi^l  managemrat  systems,  the  value  of  both  military 
and  private  sector  jobs  must  be  specified. 

•  Most  performance  research  to  date  has  focused  on  first-term  airmoi,  but  missitm 
capability  depends  on  the  contributions  of  many  people  working  in  concert.  It  is  not 
necessarily  true  that  the  skills  needed  for  success  in  the  first  term  of  ^ilistmrat  have  a 
substantial  bearing  on  success  later  in  tme’s  career.  From  an  economic  standpoint, 
however,  performance  in  the  future  is  usually  regarded  as  less  valuable  than  performance 
today.  Thus,  while  PC  research  should  someday  consider  the  broader  career  progression 
of  personnel,  the  value  of  future  p^ormance  must  be  discounted  for  costing  exercises. 

•  Changes  in  entry  standards  that  a^ect  accession  quality  mixes  will  affect  the  pattern  of 
retration  (survival  rates).  Not  everyone  who  raters  stays  for  a  full  career.  Thus,  future 
value  of  individual  performance  must  be  weighted  by  survival  rates. 

•  The  present  value  of  an  individual’s  military  contribution  over  a  career  is  the  sum  of 
each  year’s  contribution,  weighted  by  the  associated  survival  rate  and  discounted  back 
to  the  present  period.  Hie  present  v^ues  of  all  individuals  in  an  accession  cohort  could 
be  summed  together,  in  principle,  to  measure  a  Sravice’s  expectaticm  of  what  an 
accession  cdiort  will  contribute  ovra  its  expected  military  lifetime.  Adjustmrats  to 
standards  would  have  ripple  effects  on  jt^  performance  well  beyond  the  first  term,  and 
ultimately  affect  the  lifetime  performance  of  the  cohort. 

•  Differrat  quality  personnel  have  different  associated  costs  for  recruiting  and  training, 
lypically,  higher-quality  individuals  cost  more  to  recruit  because  they  have  more 
employment  or  college  options.  However,  high-quality  recruits  are  more  likely  to 
succeed  in  training  in  shorter  times  (\ess  total  training  required  and  fewer  replacemrats 
for  training  failures). 

Considering  all  of  these  issues.  Black  suggested  that  an  alternative  to  building  individual 
performance  models  would  be  to  focus  on  the  performance  of  small  cohesive  groups  that  produce 
idratifiable  products  or  services.  Statistical  analyses  of  group  ouq>ut  and  the  quality  of  its 
members  avoids  the  tangle  of  questions  ccmcraning  job  performance  and  the  interaction  of  people 
with  each  other  and  with  equipment  and  technology.  Research  could  proceed  to  minimize  the 
costs  of  attaining  a  given  level  and  distribution  of  group  ouq>uts,  or  to  maximize  group  ouq)uts 
for  a  given  cost. 
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Thoughts  on  PC  Implementation 


What  are  the  first  stq)s  in  incoipoiating  PC  into  manpower  planning  and  persoond 
standards?  Chie  stq>  is  to  simply  fiidlitate  understanding  the  performance  capabilities  of  the 
current  force.  Today’s  fimctianal  managers,  commanders,  and  trainers  lack  sound,  empirical 
performance  data  (Hi  their  personnel.  Managers  do  have  knowledgeable  insights  to  the 
csyiabilities  of  forces  of  givoi  size  and  quality  characteristics,  but  much  of  their  insight  is  a 
product  of  anecdotal  evidrace  accumulated  from  perscHial  experience.  Analyses  of  currrat 
specialties  by  grade  and  skill  level  using  PC  as  the  criterion  could  be  informative  and  revealing 
of  deficiencies.  Such  distributions  of  PC  could  serve  as  the  basis  for  more  informed  judgments 
about  how  the  force  should  be  shsqied.  Job  redesign  and  revised  standards  could  achieve  desired 
distributions.  Skills  required  for  entry  levd  jobs  surely  differ  from  those  required  at  more 
complex  levels.  Based  on  analyses  of  performance  distributions  at  various  levels,  standards 
could  be  set  at  several  levels  to  achieve  a  mix  of  paf ormance  capabilities  within  an  accessi(Hi 
cohort.  In  this  way,  an  altering  cohort  could  be  optimally  shaped  to  sustain  varicnis  career 
progressicHi  characteristics  as  the  <x>hort  ages,  bi  a  similar  way,  manpower  standards  could  be 
expanded  to  include  different  levels  of  experioice  desired  for  various  positions  (in  addition  to 
the  basic  ddll  and  grade  lequiremoits  curroitly  specified).  Personnel  planners  would  thoi  have 
the  option  of  allocating  personnel  on  the  basis  of  quality  (^titude)  or  experioice,  depending  on 
the  characteristic  of  the  available  pool  of  airmen. 

Such  a  shift  in  manpower  and  personnd  management  will  not  happen  at  once.  For  such 
distribution-based  approaches  to  work,  the  relationship  between  criteria,  such  as  PC,  and 
predictors,  such  as  ASVAB  scores,  must  be  clearly  established.  Functional  managers  must  then 
be  introduced  to  the  notion  that  analyses  based  cm  PC  can  provide  useful  information  in  addition 
to  other  more  familiar  sources. 

One  potential  first  step  toward  securing  fimctional  manager  support  would  be  to  solicit 
thdr  judgments  about  minimum  times  to  perform  job  tasks  under  peak  load  ccmditicms.  The 
relationshq)  between  PC  and  sq)titude  scores  could  be  used  to  set  standards  to  assure  peak 
performance.  This  sqjproach  shifts  the  focus  from  changing  (and  implicitly  criticizing)  curroit 
force  characteristics  to  posturing  for  future  peak  load  conditions  (something  functional  managers 
only  think  about). 

Another  perspective  involves  the  use  of  PC  in  lieu  of  training  performance  as  the 
critoion  for  standards  setting.  This  notion  is  based  on  the  ability  of  subject-matter  e]q)erts  (or 
functional  managers)  to  relate  to  time-based  performance  requiremoits  more  than  quality 
metrics.  Ejqmrts  would  set  task  benchmark  times,  such  as  fastest,  slowest,  and  normal  times 
for  acceptable  performance.  If  the  accuracy  of  benchmarks  can  be  consistoitly  confirmed  by 
comparing  actual  timed  measures  of  tasks  with  expert  estimates,  aggregation  of  benchmarla 
across  tasks  should  be  justified  to  provide  a  soise  of  fastest  and  normal  times  required  for  whole 
sets  of  tasks.  For  desired  levels  of  PC  (such  as  in  wartime  and  peacetime  op^tions),  tables 
of  aptitude  and  experience  curves  can  be  constructed  to  set  minimum  ^titude  requir^.  The 


28 


same  tables  could  be  used  to  set  multiple  cutoffs  to  achieve  ^)ecified  distributions  of  PC  as 
discussed  earlier. 

Another  immediate  applicaticm  of  PC  technology  might  be  to  calibrate  relative  measures 
of  occupaticmal  learning  difficulty  across  tasks.  Alley  (1988)  discussed  the  use  of  occiq)ati(Mial 
analysis  data  as  criteria  for  setting  oilistment  standard;  the  notion  being  that  ^tecialties  can  be 
rank  ordered  according  to  difficulty,  with  higher  quality  standards  justified  for  more  difficult 
jobs  (assuming  the  jobs  are  also  important).  But  rdative  difficulty  is  only  an  ordinal-level  scale 
that  does  not  specify  precise  predictor  cutoff  scores.  PC  by  definition  is  a  ratio-level  construct. 
For  example,  if  one  person  does  a  task  in  60  minutes  and  another  does  it  in  30  minutes,  the 
second  person  has  twice  the  PC  as  the  first  on  tiiat  task.  If  functional  manage  can  specify 
desirable  performance  times  for  groups  of  tasks,  the  relationship  betweoi  PC  and  £q>titude  can 
be  used  to  set  a  minimum  quality  standard.  Aggregation  of  times  across  groups  of  tasks  can  lead 
to  standards  for  a  spedalQr.  By  going  through  this  process  with  PC-based  standards  on  a 
number  of  represoitative  specialties,  researchers  should  be  able  to  establish  relatioa^ps  with 
learning  difficulty  that  can  serve  as  approximations  for  standards  based  on  difficulty  aloim  when 
PC  data  are  not  available.  Difficulty  ratings  can  be  collected  operationally  along  with  othm*  task- 
level  information  as  a  matter  of  course  in  the  occupational  survey  program.  Further  research 
may  suggest  better  measures,  such  as  job  incumboit  self-rqported  task  times,  that  could  support 
PC-based  standards  decisions. 


Summary 

This  discussion  of  the  role  of  PC  in  manpowor  planning  and  standards  setting  has  not 
offered  solutions  to  some  of  the  thorny  problems  facing  researchers.  It  has,  however,  attempted 
to  offer  some  ideas  to  guide  the  next  steps  in  r^earch.  Of  most  immediate  ctmcem  is  bettm’ 
understanding  of  the  ccmstruct  itself.  Issues  such  as  aggregation  of  task-level  performance 
requiremrats  within  specialties,  or  specification  of  acceptable  minimum  performance  needs  are 
depoidoit  on  reliable  measurement  of  PC  in  the  first  place. 
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